NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

Zhang, Jianrui; Cai, Mu; Xie, Tengyang; Lee, Yong Jae (August 2024, Findings of the Association for Computational Linguistics (ACL Findings))

Full Text Available
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

https://doi.org/10.18653/v1/2024.findings-emnlp.620

Wang, Haoxiang; Xiong, Wei; Xie, Tengyang; Zhao, Han; Zhang, Tong (May 2024, Association for Computational Linguistics)

Full Text Available
Adversarially Trained Actor Critic for Offline Reinforcement Learning

Cheng, Ching-An; Xie, Tengyang; Jiang, Nan; Agarwal, Alekh (July 2022, Proceedings of the 39th International Conference on Machine Learning)

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.
more » « less
Full Text Available
Bellman-consistent Pessimism for Offline Reinforcement Learning

Xie, Tengyang; Cheng, Ching-An; Jiang, Nan; Mineiro, Paul Agarwal (December 2021, Advances in neural information processing systems (selected for oral presentation))

Full Text Available
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Xie, Tengyang; Liu, Bo; Xu, Yangyang; Ghavamzadeh, Mohammad; Chow, Yinlam; Lyu, Daoming; Yoon, Daesub (January 2018, Advances in neural information processing systems)

Full Text Available

Search for: All records